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; Abstract 

In Fernandez-Duran (2004), a new family of circular distributions based on nonneg- 
ative trigonometric sums (NNTS models) is developed. Because the parameter space 
of this family is the surface of the hypersphere, an efficient Newton-like algorithm on 
manifolds is generated in order to obtain the maximum likelihood estimates of the 
' parameters. 
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1 Introduction 

The probability density function, f(0;c), of a circular random variable G (0, 2tt] must be 
nonnegative and periodic (f(0 + 2kn;c) = f(6;c)) for any integer k where c is the vector 
of parameters. Practical examples of circular random variables include the wind directions 
at different monitoring stations, the directions taken by an animal, the times at which a 
person conducts a daily activity, the time of occurrence of different events, and many others. 
Based on the results of Fejer (1915), Fernandez-Duran (2004) derived a family of circular 
distributions based on nonnegative trigonometric sums (see also Fernandez-Duran, 2007). 
In short, the nonnegative trigonometric sum is expressed as a squared norm of a complex 
number. The circular density function based on nonnegative trigonometric sums (NNTS 
density) is expressed as 



f(0;M,c) 



ai 



k=0 



M M 



k=0 1=0 



Note that i = y— T and, Ck = c r k + ic c k are complex numbers for k = 0, . . . , M, and, 
Cfc = c r k — ic c k is the conjugate of Ck- To integrate to one, it is necessary to impose the 
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following constraint in the c parameters. 

M 
k=0 

Note that c c0 = and c r0 > 0; i.e., Cq is a nonnegative real number. Thus, the c parameter 
space corresponds to the surface of a 2M+1 dimensional hypersphere. This family of circular 
distributions has the advantage of being able to fit datasets that present multimodality 
and/or skewness because the density function can be expressed as a mixture of multimodal 
circular distributions. The total number of c free parameters is equal to 2M. 

The main objective of this paper is to develop an efficient Newton-like optimization 
algorithm on the surface of a hypersphere that corresponds to a Riemann manifold, in order 
to obtain the maximum likelihood estimates of the c parameters. 

The paper is divided into five sections, including this introduction. The second section 
presents a convenient, alternative way to express likelihood functions for continuous and 
grouped data in the univariate case. Given these convenient expressions for likelihood func- 
tions, in the third section an efficient Newton-like algorithm is developed for maximizing 
the log-likelihood function on the surface of the hypersphere. The proposed algorithm is a 
particular case of a Newton-like algorithm for scalar functions on Stiefel manifolds (Absil 
et al, 2008, Manton, 2002, Balogh, 2004). In the fourth section, some applications of the 
proposed algorithms to real continuous and grouped datasets are presented. Finally, the 
conclusions of the present work are presented in the fifth section. 



2 Likelihood Functions 



2.1 Continuous Data 

Let 9\, 62, ■ ■ ■ , 9 n be a random sample of univariate circular random variables from a popula- 
tion with density function f(9; M, c), which is a member of the NNTS family with parameters 
c and M. The density function of 9j is given by 



f(0f,M,c) 



M 



k=0 



M M 

= ££ 

k=0 1=0 



CkQe 



i(k-l)9j 



which can be written in the following quadratic form: 

f{9j] M,c) = c^-ef c = c H EjC. 
Note that c— (c , c±, . . . , cm) t , e_j = (1, e -1 ^', e~"'"J , e^"^, . . . , e 



(3) 



(4) 



(1, e~ tdj , e~ 2tdj , e~ 3i6, J , . . . , e ~ Mj6 'j) T , and c H indicate the 
Hermitian transpose of the vector c that corresponds to the transpose and conjugate of the 
vector of complex numbers c, and c T is the transpose of c. Then, the likelihood for a random 
sample 9 1: . . . , 9 n , denoted by L(M,c | 0i, . . . , 9 n ), is calculated as 



L(M,c I 0i, ... A) = Y[c H e k e%c = ]Jc H E k c 

k=l k=l 

and the corresponding log-likelihood function is 

n n 

l(M,c\9 1 ,..., 9 n ) = YsMc H e k e%c) = ]T \n(c H E k c). 



(5) 



(6) 
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2.2 Grouped Data 

Let (a,j, bj] for j — 1, . . . , Q be a partition of the interval (0, 2n], i.e., (0, 2ty] = l)® =1 (aj, bj] and 
(dj, bj] fl (ofc, 6fc] = for j 7^ k, and let iVi, AT 2 , . . . , Aq be the total number of observations 
in each of the intervals in the partition. Let A^ be the total number of observations in the 
interval (a k , b k ] for k — 1, . . . , Q. This type of data is called grouped or incidence data. The 
likelihood function is 

Q 

L(M,c \N 1 ,...,N Q ) = H (F(b k ; M,c) - F(a k ; M,c)f k (7) 

k=l 

where F(b k ; M, c) is the accumulated distribution function of the NNTS density at b k . Note 
that the accumulated distribution function is obtained as 

F(b k ;M,c) = J f(9;M,c)d0 = J c H EcdO = c H I J EdOJ c (8) 

where ^ f^ k EdQ^j integrates each element of the matrix E. The elements of matrix E are of 

the form e ird for r = —M, —M + 1, . . . , 0, . . . , M — 1, M. The value of the integral is equal 
to ^(1 — e irbk ) for r ^ and equal to b k for r = 0. The likelihood can again be expressed as 
a product of quadratic forms with respect to c. 



3 The Newton-like Algorithm 

Because the c parameter space corresponds to the surface of the hypersphere, to obtain the 
maximum likelihood estimates, it is possible to apply a Newton-like optimization algorithm 
on manifolds (Absil et al, 2008). Basically, a smooth manifold is a surface that can be 
approximated locally by a hyperplane. For a point on the manifold, the approximating 
hyperplane is known as the tangent space. Then, a real function on a manifold can be 
maximized by searching for optima in the directions of movement on the tangent space and 
reprojecting onto the manifold. In differential geometry, the reprojection operation is called 
a retraction. In this paper, the optimization problem of obtaining the maximum likelihood 
estimates is equivalent to maximizing a real function (i.e., the log-likelihood) on a manifold 
(that is, the surface of the hypersphere). The goal of the Newton-like algorithm on manifolds 
is to obtain the solutions of 

gradl(c) = (9) 

where gradl(c) represents the gradient of the log-likelihood function I at the point c. The 
solutions to this equation correspond to critical points of the real function / on the surface of 
the hypersphere. The maximum likelihood estimate of c is a critical point of I. The Newton 
method on manifolds is an iterative algorithm defined by the following steps, which are from 
Absil et al. (2008): 

1. Select an initial point c . 

2. For k — 1,2,..., solve the Newton equation 

Hessl{c k )ri k = -gradl{c k ) (10) 
for the unknown r\ k in the tangent space at c k . 
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3. Set c k+l = i?c fc (^ fc ) where R £k is the retraction from the tangent space onto the manifold 
at c k . 

The algorithm terminates when the norm of the gradient or the norm of the difference 
gradl(c k ) — gradl{c k _]) is less than a prespecified error. For the case considered in this 
paper, we use differentiation rules of real functions of a complex vector to derive 

where P £ is the projection onto the tangent space. For the case of the hypersphere, Pc = 
j^I — cc H can be used. Note that the expected value of the gradient is equal to zero, and 
the Hessian matrix is obtained as 



Hessl(c) = PAgradl{c) = h^h ) ■ ( 12 ) 



Fisher's information matrix, i, which corresponds to the negative of the expected value of 
the Hessian, is equal to 

i = -E (Hessl(c)) = nPc. (13) 

Instead of using the Hessian in the Newton algorithm, we prefer to use Fisher's information 
matrix in the same way as in Fisher's method of scoring (Shao, 2003, Lange, 2004, Thisted, 
1988). The modified Newton algorithm consists of the following steps: 

1. Select an initial point c . 

2. For k — 1,2,..., solve the Fisher's scoring equation 

% = gradl(c k ) (14) 

and because 

l % = nP ^Lk = n % ( 15 ) 

and 

1 n e 

gradl(c k ) = — ~ n ^ (16) 

the Fisher's scoring equation has the following solution for rj , 
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3. Set c fe+1 = i^J = ^ 



where Rc k is a retraction from the tangent space onto the manifold for c k . In particular, we 
use 



^\\% + c k \ 
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Figure 1: Turtle data: Histogram of raw data and the best AIC (dashed line) and BIC (solid 
line) NNTS-fitted densities. 

We terminate the algorithm when the difference ||c fc+1 — c k \\ is less than a prespecified 
error. 

The modifications required to apply the proposed algorithm to grouped data is direct 
because the log-likelihood function to be maximized has the same basic form as the one we 
treated above. 

For the practical application of the algorithm, it is possible to work with vectors c with 
unit norms in order to avoid the correction terms related to the factor 77-. This is facilitated 
by the fact that it is equivalent to a modified likelihood that is obtained by multiplying 
the original likelihood by a constant factor. In relation to the initial point c , one can use 
a random initial point or the normalized average of the e k statistics. In our experience, 
using the normalized average of the e k statistics as an initial point has worked very well for 
different datasets. 

The proposed algorithm has been compared with results derived from sequential quadratic 
programming (SQP) and the Nelder-Mead optimization algorithm in many different datasets; 
the proposed algorithm shows much faster convergence for many different random initial 
points, and in contrast to SQP and the Neder-Mead algorithm, the proposed modified New- 
ton algorithm usually converges to the same point. The optimality properties of Newton 
algorithms on manifolds and its convergence properties are presented in Absil et.al. (2008), 
Manton (2002) and Balogh (2004). Of course, as in any iterative optimization algorithm, it 
is important to run the algorithm many times using different initial random points to try to 
find the global maximum of the log-likelihood function. 

The proposed algorithm is implemented using the statistical software R, particularly 
CircNNTSR v 1.0 (see Fernandez-Duran and Gregorio-Dommguez, 2009). 
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4 Examples 



The first example refers to univariate continuous circular data and consists of the directions 
taken by 76 turtles after treatment. This data set is taken from Fisher (1993, pp. 241), 
who in turn took it from Stephens (1969). The second example consists of the accumulated 
monthly number of deaths by suicide in England and Wales during the period 1982-1996 as 
an example of grouped data (Yip et al., 2000). 

4.1 Continuous Data 

Figure [1] presents the raw data histogram for the turtle data and the best-fitted NNTS mod- 
els. Table [T] presents the values of the log-likelihood, Akaike's Information Criterion (AIC), 
and the Bayesian Information Criterion (BIC) for NNTS models for M = 0, 1, . . . , 10. This 
dataset was analyzed previously by Fernandez-Duran (2004) using SQP to obtain the maxi- 
mum likelihood estimates. Contrary to SQP and the Nelder-Mead optimization method, the 
proposed Newton-like algorithm presents much faster and more stable convergence proper- 
ties. 





NNTS model 




M 


loglik {l u ) 


AIC 


BIC 





-139.68 


279.36 


279.36 


1 


-122.33 


256.66 


261.32 


2 


-107.97 


223.94 


233.26* 


3 


-107.94 


227.87 


241.86 


4 


-103.96 


223.92* 


242.57 


5 


-103.33 


226.66 


249.97 


6 


-102.72 


229.45 


257.42 


7 


-102.49 


232.98 


265.61 


8 


-100.88 


233.77 


271.06 


9 


-100.50 


237.00 


278.95 


10 


-100.27 


240.54 


287.15 



Table 1: Turtle data: Log-likelihood, AIC, and BIC values for the NNTS models fitted by 
the proposed Newton-like algorithm on the surface of a hypersphere. An asterisk (*) marks 
the best AIC and BIC models. 



4.2 Grouped Data 

Table [2] presents the raw monthly suicide data by sex taken from Yip et al. (2000). For 
grouped data, instead of using AIC or BIC criteria to select the best models among the 
considered models that use M = 0, 1, . . . , 6, we apply likelihood ratio tests to compare the 
maximized log-likelihood values for models with M = 0, 1, ... ,5, Im, with the maximized 
log-likelihood of the saturated model, Is, that corresponds, in this case, to an NNTS model 
with M = 6. In this strategy for model selection, —2{Im — Is) ls asymptotically distributed 
as a chi-squared random variable with 12 — IM degrees of freedom. The most parsimonious 
model was selected. Table |3] presents the values of the log-likelihood and likelihood ratio test 
statistics for NNTS models with M = 0,1, ... ,5. 
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Jan 


Feb 


Mar 


Apr 


May 


Jun 


Jul 


Aug 


Sep 


Oct 


Nov 


Dec 


Female 


1362 


1244 


1496 


1452 


1448 


1376 


1370 


1301 


1337 


1351 


1416 


1226 


Male 


3755 


3251 


3777 


3706 


3717 


3660 


3669 


3626 


3481 


3590 


3605 


3392 



Table 2: Monthly suicides in England and Wales for the period 1982-1996 (Yip et al, 2000). 



M 


Female 
loglik (l M ) -2(l M - l s ) 


Male 

loglik (l M ) -2(l M - l s ) 





-40698.76 


51.00 


-107403.60 


42.86 


1 


-40690.54 


34.56 


-107395.54 


26.74 


2 


-40683.13 


19.74* 


-107394.61 


24.88 


3 


-40680.95 


15.38 


-107393.90 


23.46 


4 


-40680.69 


14.86 


-107392.45 


20.56 


5 


-40676.68 


6.84 


-107384.24 


4.14 * 


6 


-40673.26 




-107382.17 





Table 3: Suicide data: Log-likelihood and likelihood ratio test statistics values for the NNTS 
models fitted by the proposed Newton-like algorithm on the surface of a hypersphere. An 
asterisk (*) marks the most parsimonious models according to likelihood ratio tests using a 
1% significance level. 

5 Conclusions 

A Newton-like algorithm on manifolds is developed to obtain the maximum likelihood esti- 
mates of the parameters of the NNTS family of distributions. Because the parameter space 
corresponds to the surface of a hypersphere, other optimization methods such as sequential 
quadratic programming (SQP) and the Nelder-Mead algorithm must address norm con- 
straints that make these optimization algorithms very slow. By working with optimization 
algorithms on manifolds, it is possible to avoid the use of constraints, thus making the pro- 
posed algorithm in this paper much faster and more efficient than these other methods. This 
is possible because the likelihood function of NNTS models can be conveniently expressed 
in terms of quadratic forms of the relevant parameters. The convenient use of the proposed 
Newton algorithm has been demonstrated in several datasets consisting of continuous and 
grouped observations. 
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